本文研究了重量和激活都将二进制神经网络(BNN)二进制为1位值,从而大大降低了记忆使用率和计算复杂性。由于现代深层神经网络具有复杂的设计,具有复杂的架构,其准确性,因此权重和激活分布的多样性非常高。因此,传统的符号函数不能很好地用于有效地在BNN中进行全精度值。为此,我们提出了一种称为Adabin的简单而有效的方法,可自适应获得最佳的二进制集$ \ {b_1,b_2 \} $($ b_1,b_1,b_2 \ in \ mathbb {r} $)的重量和激活而不是固定集(即$ \ { - 1,+1 \} $)。通过这种方式,提出的方法可以更好地拟合不同的分布,并提高二进制特征的表示能力。实际上,我们使用中心位置和1位值的距离来定义新的二进制量化函数。对于权重,我们提出了一种均衡方法,将对称分布的对称中心与实价分布相对,并最大程度地减少它们的kullback-leibler差异。同时,我们引入了一种基于梯度的优化方法,以获取这两个激活参数,这些参数以端到端的方式共同训练。基准模型和数据集的实验结果表明,拟议的Adabin能够实现最新性能。例如,我们使用RESNET-18体系结构在Imagenet上获得66.4 \%TOP-1的精度,并使用SSD300获得了Pascal VOC的69.4映射。
translated by 谷歌翻译
尽管最近对Deepfake技术的滥用引起了严重的关注,但由于每个帧的光真逼真的合成,如何检测DeepFake视频仍然是一个挑战。现有的图像级方法通常集中在单个框架上,而忽略了深击视频中隐藏的时空提示,从而导致概括和稳健性差。视频级检测器的关键是完全利用DeepFake视频中不同框架的当地面部区域分布在当地面部区域中的时空不一致。受此启发,本文提出了一种简单而有效的补丁级方法,以通过时空辍学变压器促进深击视频检测。该方法将每个输入视频重组成贴片袋,然后将其馈入视觉变压器以实现强大的表示。具体而言,提出了时空辍学操作,以充分探索斑块级时空提示,并作为有效的数据增强,以进一步增强模型的鲁棒性和泛化能力。该操作是灵活的,可以轻松地插入现有的视觉变压器中。广泛的实验证明了我们对25种具有令人印象深刻的鲁棒性,可推广性和表示能力的最先进的方法的有效性。
translated by 谷歌翻译
评估对象图像的模糊对于提高对象识别和检索的性能至关重要。主要挑战在于缺乏具有可靠标签和有效学习策略的丰富图像。当前的数据集标记为有限且混乱的质量水平。为了克服这一限制,我们建议将成对图像之间的等级关系标记,而不是它们的质量水平,因为人类更容易标记,并建立具有可靠标签的大规模逼真的面部图像模糊评估数据集。基于此数据集,我们提出了一种仅以成对等级标签作为监督的方法来获得模糊分数。此外,为了进一步提高绩效,我们提出了一种基于四倍体排名一致性的自制方法,以更有效地利用未标记的数据。受监督和自我监督的方法构成了最终的半监督学习框架,可以端对端训练。实验结果证明了我们方法的有效性。
translated by 谷歌翻译
最近,基于深度学习的图像降级方法在测试数据上具有与训练集相同的测试数据的有希望的性能,在该数据中,已经学习了基于合成或收集的现实世界训练数据的各种denoising模型。但是,在处理真实世界的嘈杂图像时,Denoising的性能仍然受到限制。在本文中,我们提出了一种简单而有效的贝叶斯深集合(BDE)方法,用于真实世界图像denoising,其中可以融合使用各种训练数据设置进行预训练的几位代表性的深层Denoiser,以提高稳健性。 BDE的基础是,现实世界的图像噪声高度取决于信号依赖性,并且在现实世界中的嘈杂图像中的异质噪声可以由不同的Deoisiser分别处理。特别是,我们将受过良好训练的CBDNET,NBNET,HINET,UFORFORMER和GMSNET进入Denoiser池,并采用U-NET来预测Pixel的加权图以融合这些DeOisiser。引入了贝叶斯深度学习策略,而不是仅仅学习像素的加权地图,而是为了预测加权不确定性和加权图,可以通过该策略来建模预测差异,以改善现实世界中的嘈杂图像的鲁棒性。广泛的实验表明,可以通过融合现有的DINOISER而不是训练一个以昂贵的成本来训练一个大的Denoiser来更好地消除现实世界的噪音。在DND数据集上,我们的BDE实现了 +0.28〜dB PSNR的增益,而不是最先进的denoising方法。此外,我们注意到,在应用于现实世界嘈杂的图像时,基于不同高斯噪声水平的BDE Denoiser优于最先进的CBDNET。此外,我们的BDE可以扩展到其他图像恢复任务,并在基准数据集上获得 +0.30dB, +0.18dB和 +0.12dB PSNR的收益,以分别用于图像去除图像,图像降低和单个图像超级分辨率。
translated by 谷歌翻译
提出了一种称为误差损失网络(ELN)的新型模型,以构建监督学习的误差损失函数。 ELN的结构类似于径向基函数(RBF)神经网络,但其输入是误差样本,输出是与该误差样本相对应的损耗。这意味着ELN的非线性输入输出映射器会创建误差损失函数。拟议的ELN为大量错误损失函数提供了统一模型,其中包括一些信息理论学习(ITL)损失函数作为特殊情况。 ELN的激活函数,权重参数和网络大小可以从误差样本中进行预先确定或学到。在此基础上,我们提出了一个新的机器学习范式,其中学习过程分为两个阶段:首先,使用ELN学习损失函数;其次,使用学习的损失功能继续执行学习。提出了实验结果,以证明新方法的理想性能。
translated by 谷歌翻译
近年来,守则已经安全地应用于强大的自适应过滤,以消除脉冲噪声或异常值的不利影响。正文通常被定义为两个随机变量之间的高斯内核的期望。当两个随机变量之间的误差对称地分布零点时,此定义是合理的。对于不对称错误分布的情况,对称高斯内核不合适,并且无法适应错误分布。为了解决这个问题,在这篇简短的情况下,我们提出了一种新的正文变异,名称不对称的正文,它使用非对称高斯模型作为内核功能。此外,开发了一种基于非对称控制的鲁棒自适应滤波算法,分析了其稳态收敛性能。提供了模拟以确认所提出的算法的理论结果和良好性能。
translated by 谷歌翻译
In this paper, we propose a robust 3D detector, named Cross Modal Transformer (CMT), for end-to-end 3D multi-modal detection. Without explicit view transformation, CMT takes the image and point clouds tokens as inputs and directly outputs accurate 3D bounding boxes. The spatial alignment of multi-modal tokens is performed implicitly, by encoding the 3D points into multi-modal features. The core design of CMT is quite simple while its performance is impressive. CMT obtains 73.0% NDS on nuScenes benchmark. Moreover, CMT has a strong robustness even if the LiDAR is missing. Code will be released at https://github.com/junjie18/CMT.
translated by 谷歌翻译
Knowledge graphs (KG) have served as the key component of various natural language processing applications. Commonsense knowledge graphs (CKG) are a special type of KG, where entities and relations are composed of free-form text. However, previous works in KG completion and CKG completion suffer from long-tail relations and newly-added relations which do not have many know triples for training. In light of this, few-shot KG completion (FKGC), which requires the strengths of graph representation learning and few-shot learning, has been proposed to challenge the problem of limited annotated data. In this paper, we comprehensively survey previous attempts on such tasks in the form of a series of methods and applications. Specifically, we first introduce FKGC challenges, commonly used KGs, and CKGs. Then we systematically categorize and summarize existing works in terms of the type of KGs and the methods. Finally, we present applications of FKGC models on prediction tasks in different areas and share our thoughts on future research directions of FKGC.
translated by 谷歌翻译
Few Shot Instance Segmentation (FSIS) requires models to detect and segment novel classes with limited several support examples. In this work, we explore a simple yet unified solution for FSIS as well as its incremental variants, and introduce a new framework named Reference Twice (RefT) to fully explore the relationship between support/query features based on a Transformer-like framework. Our key insights are two folds: Firstly, with the aid of support masks, we can generate dynamic class centers more appropriately to re-weight query features. Secondly, we find that support object queries have already encoded key factors after base training. In this way, the query features can be enhanced twice from two aspects, i.e., feature-level and instance-level. In particular, we firstly design a mask-based dynamic weighting module to enhance support features and then propose to link object queries for better calibration via cross-attention. After the above steps, the novel classes can be improved significantly over our strong baseline. Additionally, our new framework can be easily extended to incremental FSIS with minor modification. When benchmarking results on the COCO dataset for FSIS, gFSIS, and iFSIS settings, our method achieves a competitive performance compared to existing approaches across different shots, e.g., we boost nAP by noticeable +8.2/+9.4 over the current state-of-the-art FSIS method for 10/30-shot. We further demonstrate the superiority of our approach on Few Shot Object Detection. Code and model will be available.
translated by 谷歌翻译
Graph Neural Networks (GNNs) have shown satisfying performance on various graph learning tasks. To achieve better fitting capability, most GNNs are with a large number of parameters, which makes these GNNs computationally expensive. Therefore, it is difficult to deploy them onto edge devices with scarce computational resources, e.g., mobile phones and wearable smart devices. Knowledge Distillation (KD) is a common solution to compress GNNs, where a light-weighted model (i.e., the student model) is encouraged to mimic the behavior of a computationally expensive GNN (i.e., the teacher GNN model). Nevertheless, most existing GNN-based KD methods lack fairness consideration. As a consequence, the student model usually inherits and even exaggerates the bias from the teacher GNN. To handle such a problem, we take initial steps towards fair knowledge distillation for GNNs. Specifically, we first formulate a novel problem of fair knowledge distillation for GNN-based teacher-student frameworks. Then we propose a principled framework named RELIANT to mitigate the bias exhibited by the student model. Notably, the design of RELIANT is decoupled from any specific teacher and student model structures, and thus can be easily adapted to various GNN-based KD frameworks. We perform extensive experiments on multiple real-world datasets, which corroborates that RELIANT achieves less biased GNN knowledge distillation while maintaining high prediction utility.
translated by 谷歌翻译